28 research outputs found

    On-Demand Big Data Integration: A Hybrid ETL Approach for Reproducible Scientific Research

    Full text link
    Scientific research requires access, analysis, and sharing of data that is distributed across various heterogeneous data sources at the scale of the Internet. An eager ETL process constructs an integrated data repository as its first step, integrating and loading data in its entirety from the data sources. The bootstrapping of this process is not efficient for scientific research that requires access to data from very large and typically numerous distributed data sources. a lazy ETL process loads only the metadata, but still eagerly. Lazy ETL is faster in bootstrapping. However, queries on the integrated data repository of eager ETL perform faster, due to the availability of the entire data beforehand. In this paper, we propose a novel ETL approach for scientific data integration, as a hybrid of eager and lazy ETL approaches, and applied both to data as well as metadata. This way, Hybrid ETL supports incremental integration and loading of metadata and data from the data sources. We incorporate a human-in-the-loop approach, to enhance the hybrid ETL, with selective data integration driven by the user queries and sharing of integrated data between users. We implement our hybrid ETL approach in a prototype platform, Obidos, and evaluate it in the context of data sharing for medical research. Obidos outperforms both the eager ETL and lazy ETL approaches, for scientific research data integration and sharing, through its selective loading of data and metadata, while storing the integrated data in a scalable integrated data repository.Comment: Pre-print Submitted to the DMAH Special Issue of the Springer DAPD Journa

    Latency-Sensitive Web Service Workflows: A Case for a Software-Defined Internet

    Full text link
    The Internet, at large, remains under the control of service providers and autonomous systems. The Internet of Things (IoT) and edge computing provide an increasing demand and potential for more user control for their web service workflows. Network Softwarization revolutionizes the network landscape in various stages, from building, incrementally deploying, and maintaining the environment. Software-Defined Networking (SDN) and Network Functions Virtualization (NFV) are two core tenets of network softwarization. SDN offers a logically centralized control plane by abstracting away the control of the network devices in the data plane. NFV virtualizes dedicated hardware middleboxes and deploys them on top of servers and data centers as network functions. Thus, network softwarization enables efficient management of the system by enhancing its control and improving the reusability of the network services. In this work, we propose our vision for a Software-Defined Internet (SDI) for latency-sensitive web service workflows. SDI extends network softwarization to the Internet-scale, to enable a latency-aware user workflow execution on the Internet.Comment: Accepted for Publication at The Seventh International Conference on Software Defined Systems (SDS-2020

    CONTROL-CORE: A Framework for Simulation and Design of Closed-Loop Peripheral Neuromodulation Control Systems

    Get PDF
    Closed-loop Vagus Nerve Stimulation (VNS) based on physiological feedback signals is a promising approach to regulate organ functions and develop therapeutic devices. Designing closed-loop neurostimulation systems requires simulation environments and computing infrastructures that support i) modeling the physiological responses of organs under neuromodulation, also known as physiological models, and ii) the interaction between the physiological models and the neuromodulation control algorithms. However, existing simulation platforms do not support closed-loop VNS control systems modeling without extensive rewriting of computer code and manual deployment and configuration of programs. The CONTROL-CORE project aims to develop a flexible software platform for designing and implementing closed-loop VNS systems. This paper proposes the software architecture and the elements of the CONTROL-CORE platform that allow the interaction between a controller and a physiological model in feedback. CONTROL-CORE facilitates modular simulation and deployment of closed-loop peripheral neuromodulation control systems, spanning multiple organizations securely and concurrently. CONTROL-CORE allows simulations to run on different operating systems, be developed in various programming languages (such as Matlab, Python, C++, and Verilog), and be run locally, in containers, and in a distributed fashion. The CONTROL-CORE platform allows users to create tools and testbenches to facilitate sophisticated simulation experiments. We tested the CONTROL-CORE platform in the context of closed-loop control of cardiac physiological models, including pulsatile and nonpulsatile rat models. These were tested using various controllers such as Model Predictive Control and Long-Short-Term Memory based controllers. Our wide range of use cases and evaluations show the performance, flexibility, and usability of the CONTROL-CORE platform

    A DICOM Framework for Machine Learning and Processing Pipelines Against Real-time Radiology Images

    Get PDF
    Real-time execution of machine learning (ML) pipelines on radiology images is difficult due to limited computing resources in clinical environments, whereas running them in research clusters requires efficient data transfer capabilities. We developed Niffler, an open-source Digital Imaging and Communications in Medicine (DICOM) framework that enables ML and processing pipelines in research clusters by efficiently retrieving images from the hospitals’ PACS and extracting the metadata from the images. We deployed Niffler at our institution (Emory Healthcare, the largest healthcare network in the state of Georgia) and retrieved data from 715 scanners spanning 12 sites, up to 350 GB/day continuously in real-time as a DICOM data stream over the past 2 years. We also used Niffler to retrieve images bulk on-demand based on user-provided filters to facilitate several research projects. This paper presents the architecture and three such use cases of Niffler. First, we executed an IVC filter detection and segmentation pipeline on abdominal radiographs in real-time, which was able to classify 989 test images with an accuracy of 96.0%. Second, we applied the Niffler Metadata Extractor to understand the operational efficiency of individual MRI systems based on calculated metrics. We benchmarked the accuracy of the calculated exam time windows by comparing Niffler against the Clinical Data Warehouse (CDW). Niffler accurately identified the scanners’ examination timeframes and idling times, whereas CDW falsely depicted several exam overlaps due to human errors. Third, with metadata extracted from the images by Niffler, we identified scanners with misconfigured time and reconfigured five scanners. Our evaluations highlight how Niffler enables real-time ML and processing pipelines in a research cluster
    corecore